Doge is a sequence transformation model that employs dynamic masked attention mechanisms, capable of state transitions using either multi-layer perceptrons or cross-domain mixture of experts.
Large Language Model
Transformers Supports Multiple Languages